AITopics | robot pose

Collaborating Authors

robot pose

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mobi-$π$: Mobilizing Your Robot Learning Policy

Yang, Jingyun, Huang, Isabella, Vu, Brandon, Bajracharya, Max, Antonova, Rika, Bohg, Jeannette

arXiv.org Artificial IntelligenceSep-29-2025

Learned visuomotor policies are capable of performing increasingly complex manipulation tasks. However, most of these policies are trained on data collected from limited robot positions and camera viewpoints. This leads to poor generalization to novel robot positions, which limits the use of these policies on mobile platforms, especially for precise tasks like pressing buttons or turning faucets. In this work, we formulate the policy mobilization problem: find a mobile robot base pose in a novel environment that is in distribution with respect to a manipulation policy trained on a limited set of camera viewpoints. Compared to retraining the policy itself to be more robust to unseen robot base pose initializations, policy mobilization decouples navigation from manipulation and thus does not require additional demonstrations. Crucially, this problem formulation complements existing efforts to improve manipulation policy robustness to novel viewpoints and remains compatible with them. We propose a novel approach for policy mobilization that bridges navigation and manipulation by optimizing the robot's base pose to align with an in-distribution base pose for a learned policy. Our approach utilizes 3D Gaussian Splatting for novel view synthesis, a score function to evaluate pose suitability, and sampling-based optimization to identify optimal robot poses. To understand policy mobilization in more depth, we also introduce the Mobi-$π$ framework, which includes: (1) metrics that quantify the difficulty of mobilizing a given policy, (2) a suite of simulated mobile manipulation tasks based on RoboCasa to evaluate policy mobilization, and (3) visualization tools for analysis. In both our developed simulation task suite and the real world, we show that our approach outperforms baselines, demonstrating its effectiveness for policy mobilization.

artificial intelligence, manipulation policy, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2505.23692

Country: Europe (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation

Chen, Jason, Liu, I-Chun Arthur, Sukhatme, Gaurav, Seita, Daniel

arXiv.org Artificial IntelligenceSep-25-2025

Training robust bimanual manipulation policies via imitation learning requires demonstration data with broad coverage over robot poses, contacts, and scene contexts. However, collecting diverse and precise real-world demonstrations is costly and time-consuming, which hinders scalability. Prior works have addressed this with data augmentation, typically for either eye-in-hand (wrist camera) setups with RGB inputs or for generating novel images without paired actions, leaving augmentation for eye-to-hand (third-person) RGB-D training with new action labels less explored. In this paper, we propose Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation (ROPA), an offline imitation learning data augmentation method that fine-tunes Stable Diffusion to synthesize third-person RGB and RGB-D observations of novel robot poses. Our approach simultaneously generates corresponding joint-space action labels while employing constrained optimization to enforce physical consistency through appropriate gripper-to-object contact constraints in bimanual scenarios. We evaluate our method on 5 simulated and 3 real-world tasks. Our results across 2625 simulation trials and 300 real-world trials demonstrate that ROPA outperforms baselines and ablations, showing its potential for scalable RGB and RGB-D data augmentation in eye-to-hand bimanual manipulation. Our project website is available at: https://ropaaug.github.io/.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2509.19454

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Calib3R: A 3D Foundation Model for Multi-Camera to Robot Calibration and 3D Metric-Scaled Scene Reconstruction

Allegro, Davide, Terreran, Matteo, Ghidoni, Stefano

arXiv.org Artificial IntelligenceSep-11-2025

RELATED WORKS Hand-Eye Calibration: Hand-eye calibration is a well-established problem in robotics that aims to estimate the relative pose between a camera and a robot's end-effector. It is typically addressed by capturing a series of images of a known calibration pattern (e.g., a checkerboard) using a camera rigidly mounted on the robot hand, and using both the images and the corresponding robot poses to compute the camera's extrinsic parameters. Different mathematical formulations exist for solving hand-eye calibration; a widely adopted approach involves solving the equation AX = XB, where X is the unknown rigid transformation describing the pose of the camera with respect to the robot, while A and B denote the relative motions of the end-effector (from robot kinematics) and the camera (from pattern observations), respectively [31], [36]-[38]. Several other approaches were proposed: Shah [39] formulated a closed-form solution for the hand-eye problem by using an algorithm based on Singular V alue Decomposition (SVD) and the Kronecker product to solve for rotation and translation separately, while Li et al. [40] used dual quaternions to solve them simultaneously overcoming the limitations of the Kronecker product. Wang et al. [23] extended hand-eye calibration to multi-camera setups by incorporating a common reference frame but required an external motion capture system, limiting its applicability to small setups. Andreff and Heller [41], [42] proposed two similar hand-eye calibration methods that leverage the Structure-from-Motion (SfM) paradigm to estimate camera motion and introduce a formulation for hand-eye calibration that includes a factor to metrically scale camera poses.

artificial intelligence, calibration, image understanding, (20 more...)

arXiv.org Artificial Intelligence

2509.08813

Genre: Research Report (0.64)

Industry: Media > Photography (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)

Add feedback

Tree-SLAM: semantic object SLAM for efficient mapping of individual trees in orchards

Rapado-Rincon, David, Kootstra, Gert

arXiv.org Artificial IntelligenceJul-17-2025

Accurate mapping of individual trees is an important component for precision agriculture in orchards, as it allows autonomous robots to perform tasks like targeted operations or individual tree monitoring. However, creating these maps is challenging because GPS signals are often unreliable under dense tree canopies. Furthermore, standard Simultaneous Localization and Mapping (SLAM) approaches struggle in orchards because the repetitive appearance of trees can confuse the system, leading to mapping errors. To address this, we introduce Tree-SLAM, a semantic SLAM approach tailored for creating maps of individual trees in orchards. Utilizing RGB-D images, our method detects tree trunks with an instance segmentation model, estimates their location and re-identifies them using a cascade-graph-based data association algorithm. These re-identified trunks serve as landmarks in a factor graph framework that integrates noisy GPS signals, odometry, and trunk observations. The system produces maps of individual trees with a geo-localization error as low as 18 cm, which is less than 20% of the planting distance. The proposed method was validated on diverse datasets from apple and pear orchards across different seasons, demonstrating high mapping accuracy and robustness in scenarios with unreliable GPS signals. Keywords: semantic SLAM, agricultural robotics, multi-object tracking, factor graph 1. Introduction A significant decline in available agricultural labor presents a challenge for sustaining agricultural production, potentially leading to food losses [1, 2]. Automation and robotics are emerging as key technologies to address these issues, offering the potential to enhance productivity, by compensating for labor scarcity and optimizing farm management through data-driven insights [3, 4]. This is particularly relevant in high-value crops such as those found in orchards, where precise operations have the potential to improve efficiency and reduce labor needs. For autonomous robots to perform tasks effectively in orchards, such as targeted spraying or individual tree monitoring, they require a detailed map of the environment and the ability to determine their position within it.

detection, machine learning, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

2507.12093

Genre: Research Report > New Finding (0.68)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Understanding and Mitigating Network Latency Effect on Teleoperated-Robot with Extended Reality

Zhang, Ziliang, Liu, Cong, Kim, Hyoseung

arXiv.org Artificial IntelligenceJun-6-2025

Robot teleoperation with extended reality (XR teleoperation) enables intuitive interaction by allowing remote robots to mimic user motions with real-time 3D feedback. However, existing systems face significant motion-to-motion (M2M) latency--the delay between the user's latest motion and the corresponding robot feedback--leading to high teleoperation error and mission completion time. This issue stems from the system's exclusive reliance on network communication, making it highly vulnerable to network degradation. To address these challenges, we introduce TeleXR, the first end-to-end, fully open-sourced XR teleoperation framework that decouples robot control and XR visualization from network dependencies. TeleXR leverages local sensing data to reconstruct delayed or missing information of the counterpart, thereby significantly reducing network-induced issues. This approach allows both the XR and robot to run concurrently with network transmission while maintaining high robot planning accuracy. TeleXR also features contention-aware scheduling to mitigate GPU contention and bandwidth-adaptive point cloud scaling to cope with limited bandwidth.

artificial intelligence, information, robot, (16 more...)

arXiv.org Artificial Intelligence

2506.01135

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Telecommunications (0.47)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

DiffusionRL: Efficient Training of Diffusion Policies for Robotic Grasping Using RL-Adapted Large-Scale Datasets

Makarova, Maria, Liu, Qian, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMay-27-2025

Diffusion models have proven to be a powerful tool in the field of generative artificial intelligence successfully applied in image synthesis, video generation and audio generation [1, 2, 3, 4, 5]. Using an iterative denoising approach, these models learn to invert a diffusion process, transforming random noise into sophisticated, high-quality samples. Reinforcement Learning (RL) and Imitation Learning (IL) have become particularly popular in the field of robot learning for the tasks of perceiving the environment and making decisions to perform actions in recent years [6]. But RL approach is highly dependent on the correct tuning of hyper-parameters [7], and effective IL training requires a large amount of diverse high-quality data [8]. Also, the multimodal nature of complex robot tasks hinders the construction of stable control. More recently, researchers have begun to integrate an approach in the form of diffusion policy learning into the field of robotics as well. The concept of diffusion policy was first introduced by Chi et al. [9]. The diffusion process has been applied to robot action sequence generation since such models are able to capture the complex mul-timodal distributions that are characteristic of many robotics tasks, as mentioned above.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2505.18876

Country:

Europe (0.28)
Asia > China (0.14)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Occupancy-SLAM: An Efficient and Robust Algorithm for Simultaneously Optimizing Robot Poses and Occupancy Map

Wang, Yingyu, Zhao, Liang, Huang, Shoudong

arXiv.org Artificial IntelligenceFeb-14-2025

Joint optimization of poses and features has been extensively studied and demonstrated to yield more accurate results in feature-based SLAM problems. However, research on jointly optimizing poses and non-feature-based maps remains limited. Occupancy maps are widely used non-feature-based environment representations because they effectively classify spaces into obstacles, free areas, and unknown regions, providing robots with spatial information for various tasks. In this paper, we propose Occupancy-SLAM, a novel optimization-based SLAM method that enables the joint optimization of robot trajectory and the occupancy map through a parameterized map representation. The key novelty lies in optimizing both robot poses and occupancy values at different cell vertices simultaneously, a significant departure from existing methods where the robot poses need to be optimized first before the map can be estimated. Evaluations using simulations and practical 2D laser datasets demonstrate that the proposed approach can robustly obtain more accurate robot trajectories and occupancy maps than state-of-the-art techniques with comparable computational time. Preliminary results in the 3D case further confirm the potential of the proposed method in practical 3D applications, achieving more accurate results than existing methods.

artificial intelligence, occupancy map, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2502.06292

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Grid-based Submap Joining: An Efficient Algorithm for Simultaneously Optimizing Global Occupancy Map and Local Submap Frames

Wang, Yingyu, Zhao, Liang, Huang, Shoudong

arXiv.org Artificial IntelligenceJan-22-2025

Optimizing robot poses and the map simultaneously has been shown to provide more accurate SLAM results. However, for non-feature based SLAM approaches, directly optimizing all the robot poses and the whole map will greatly increase the computational cost, making SLAM problems difficult to solve in large-scale environments. To solve the 2D non-feature based SLAM problem in large-scale environments more accurately and efficiently, we propose the grid-based submap joining method. Specifically, we first formulate the 2D grid-based submap joining problem as a non-linear least squares (NLLS) form to optimize the global occupancy map and local submap frames simultaneously. We then prove that in solving the NLLS problem using Gauss-Newton (GN) method, the increments of the poses in each iteration are independent of the occupancy values of the global occupancy map. Based on this property, we propose a poseonly GN algorithm equivalent to full GN method to solve the NLLS problem. The proposed submap joining algorithm is very efficient due to the independent property and the pose-only solution. Evaluations using simulations and publicly available practical 2D laser datasets confirm the outperformance of our proposed method compared to the state-of-the-art methods in terms of efficiency and accuracy, as well as the ability to solve the grid-based SLAM problem in very large-scale environments.

artificial intelligence, machine learning, submap, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS58592.2024.10802536

2501.12764

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Differentiable Robot Rendering

Liu, Ruoshi, Canberk, Alper, Song, Shuran, Vondrick, Carl

arXiv.org Artificial IntelligenceOct-17-2024

Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings. A key challenge in applying them to robotic tasks is the modality gap between visual data and action data. We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters. Our model integrates a kinematics-aware deformable model and Gaussians Splatting and is compatible with any robot form factors and degrees of freedom. We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models. Quantitative and qualitative results show that our differentiable rendering model provides effective gradients for robotic control directly from pixels, setting the foundation for the future applications of vision foundation models in robotics.

artificial intelligence, gaussian, robot, (16 more...)

arXiv.org Artificial Intelligence

2410.13851

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Combining Planning and Diffusion for Mobility with Unknown Dynamics

Ravan, Yajvan, Yang, Zhutian, Chen, Tao, Lozano-Pérez, Tomás, Kaelbling, Leslie Pack

arXiv.org Artificial IntelligenceOct-9-2024

Manipulation of large objects over long horizons (such as carts in a warehouse) is an essential skill for deployable robotic systems. Large objects require mobile manipulation which involves simultaneous manipulation, navigation, and movement with the object in tow. In many real-world situations, object dynamics are incredibly complex, such as the interaction of an office chair (with a rotating base and five caster wheels) and the ground. We present a hierarchical algorithm for long-horizon robot manipulation problems in which the dynamics are partially unknown. We observe that diffusion-based behavior cloning is highly effective for short-horizon problems with unknown dynamics, so we decompose the problem into an abstract high-level, obstacle-aware motion-planning problem that produces a waypoint sequence. We use a short-horizon, relative-motion diffusion policy to achieve the waypoints in sequence. We train mobile manipulation policies on a Spot robot that has to push and pull an office chair. Our hierarchical manipulation policy performs consistently better, especially when the horizon increases, compared to a diffusion policy trained on long-horizon demonstrations or motion planning assuming a rigidly-attached object (success rate of 8 (versus 0 and 5 respectively) out of 10 runs). Importantly, our learned policy generalizes to new layouts, grasps, chairs, and flooring that induces more friction, without any further training, showing promise for other complex mobile manipulation problems. Project Page: https://yravan.github.io/plannerorderedpolicy/

manipulation, robot, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2410.06911

Country:

North America > United States > Massachusetts (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback